Skip to content

Robj/deadlock fix#644

Merged
rtjohnso merged 6 commits intomainfrom
robj/deadlock-fix
Jul 4, 2025
Merged

Robj/deadlock fix#644
rtjohnso merged 6 commits intomainfrom
robj/deadlock-fix

Conversation

@rtjohnso
Copy link
Collaborator

@rtjohnso rtjohnso commented Jul 4, 2025

This patch fixes two deadlocks, also enabling threads to perform updates to splinterdb while also performing asynchronous queries.

The first deadlock was caused by trunk_merge_lookup acquiring two locks on the root. This deadlock prone because, if another thread begins to GC the root, it will set the writer bit on the root lock and then wait for the readers to clear out. But, if the query code has already acquired its first read lock on the root and attempts to acquire its second read lock after the writer bit is set, then it will deadlock with the GC thread.

The fix is to acquire a lock on the root only once.

The second deadlock could occur when a thread performing asynchronous queries performs an update which results in a GC. In this case, one of the thread's async queries may hold a read lock on a node-to-be-GCed, but the GC code will wait forever for that read lock to be released, causing the thread to deadlock with itself.

The fix in this case is to defer GC until there are no more readers. So there's now a queue of GC tasks in the trunk context. Any time GC reaches a node with a read lock, it stops and puts that node on the queue. Every trunk modification checks the queue and processes any GCs that are now available to perform.

Fixes #314 .

@netlify
Copy link

netlify bot commented Jul 4, 2025

Deploy Preview for splinterdb canceled.

Name Link
🔨 Latest commit bbffceb
🔍 Latest deploy log https://app.netlify.com/projects/splinterdb/deploys/686739e0449a240008b9c556

@rtjohnso rtjohnso enabled auto-merge (squash) July 4, 2025 01:14
@rtjohnso rtjohnso merged commit 9b289bf into main Jul 4, 2025
16 of 17 checks passed
@rtjohnso rtjohnso deleted the robj/deadlock-fix branch July 4, 2025 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Perf: splinter_test --parallel-perf seems to run endlessly, without making any progress. Ran overnight.

1 participant